Skip to content

feat: rebalance associative bitwise/Add/Multiply chains to avoid protobuf recursion limit#4588

Open
schenksj wants to merge 4 commits into
apache:mainfrom
schenksj:fix/4577-rebalance-associative-binary
Open

feat: rebalance associative bitwise/Add/Multiply chains to avoid protobuf recursion limit#4588
schenksj wants to merge 4 commits into
apache:mainfrom
schenksj:fix/4577-rebalance-associative-binary

Conversation

@schenksj

@schenksj schenksj commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Closes #4577.

Follow-up to #4531 (deep And/Or chains). Protobuf's recursion limit (100) applies to
any deeply nested BinaryExpr, so a long left-deep chain of other associative operators
overflows the same way when the serialized plan is re-parsed.

What

Extend the rebalancing (flattenAssociative + a balanced O(log n)-depth tree) to:

  • BitwiseAnd / BitwiseOr / BitwiseXor — always integral and exactly associative, so
    they reuse the existing createBalancedBinaryExpr directly.
  • Add / Multiply — gated via isAssociativeAndRebalanceable to integral types in
    LEGACY (wrapping, modular) eval mode, the only exactly-associative case. Float isn't
    associative (Spark's ReorderAssociativeOperator excludes it too); ANSI/TRY make
    integer-overflow position observable and grouping changes it; decimal precision grows
    per op. Those keep the existing left-deep serialization. Add/Multiply emit a
    MathExpr (eval_mode + return_type) rather than a BinaryExpr, so a new
    createBalancedMathExpr builds the balanced tree with the chain's uniform type and
    eval mode at every inner node.

Tests

Mirror #4531: project 200-deep chains and assert Comet runs them natively with results
matching Spark (which also verifies the associativity guarantee).

Not addressed (by design)

Deep Add/Multiply chains that are ANSI/TRY, floating-point, or decimal are not
rebalanced — they aren't exactly associative, so regrouping could change the result or which
intermediate overflows. They keep the pre-existing left-deep serialization, so a sufficiently
deep such chain can still hit the protobuf recursion limit on re-parse, exactly as before this
PR. This is a deliberate correctness-over-depth trade-off, and strictly better than the prior
state (which had no rebalancing for any of these operators).

schenksj and others added 2 commits June 3, 2026 21:43
)

Follow-up to apache#4531 (deep And/Or chains). Protobuf's recursion limit (100) applies
to any deeply nested BinaryExpr, so a long left-deep chain of other associative
operators overflows the same way when the serialized plan is re-parsed.

Extend the rebalancing (flattenAssociative + a balanced O(log n)-depth tree) to:
  - BitwiseAnd / BitwiseOr / BitwiseXor: always integral and exactly associative,
    so they reuse the existing createBalancedBinaryExpr directly.
  - Add / Multiply: gated via isAssociativeAndRebalanceable to integral types in
    LEGACY (wrapping, modular) eval mode -- the only exactly-associative case. Float
    isn't associative (Spark's ReorderAssociativeOperator excludes it too); ANSI/TRY
    make integer overflow position (which the grouping changes) observable; decimal
    precision grows per op. Those keep the existing left-deep serialization. Add and
    Multiply emit a MathExpr (eval_mode + return_type) rather than a BinaryExpr, so a
    new createBalancedMathExpr builds the balanced tree with the chain's uniform type
    and eval mode at every inner node.

Tests mirror apache#4531: project 200-deep chains and assert Comet runs them natively with
results matching Spark (which also verifies the associativity guarantee).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…associative-binary

# Conflicts:
#	spark/src/main/scala/org/apache/comet/serde/arithmetic.scala
#	spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
@andygrove

Copy link
Copy Markdown
Member

Nice extension of the #4531 rebalancing. I read through the compatibility reasoning carefully and it holds up. A few notes and questions below.

Compatibility

The load-bearing claim is that serializing every inner node of the balanced tree with a single returnType/evalMode is valid, and it is. Whenever Spark's type coercion widens an operand it inserts a Cast, and Cast doesn't match the flattenAssociative predicate, so it terminates flattening. That means every leaf within a maximal chain already shares that chain's type, and the single returnType/evalMode is exactly what the left-deep serialization would have emitted node by node. A mixed-width case like byteCol + byteCol + intCol stops flattening at the Cast(..., Int) boundary, so the inner byte add is serialized separately with its own type.

The integral + LEGACY gate is also correct. Modular add/multiply are associative at every integer width, so regrouping yields the same residue, and the float/ANSI/TRY/decimal exclusions are all justified. I couldn't find a case where the rebalanced tree diverges from Spark.

Questions / suggestions

  1. The gate means deep Add/Multiply chains that are ANSI, float, or decimal are still serialized left-deep and will still overflow the recursion limit on re-parse. That's unavoidable given they aren't safely associative, and it's strictly better than before. Could you confirm those cases degrade gracefully (fall back) rather than crash, and maybe add a one-line note in the description that they remain unfixed by design?

  2. In the deep bitwise test, the col("_1") + lit(i) leaves are Add nodes that take the non-rebalanced serde under default ANSI on Spark 4.0. The values stay small so nothing throws and the bitwise chain is still what's under test, so it's harmless. A short comment noting the leaves are intentionally cheap adds would save a future reader a head-scratch.

  3. Optional: createBalancedMathExpr duplicates most of createBalancedBinaryExpr, differing only in the inner-node builder. The recursive build could be factored into one shared helper if you felt like it, but the current form reads fine and matches the existing style.

Overall this looks correct and tightly scoped. The multiply test is a nice touch since 200 terms genuinely overflow and wrap, exercising the associativity guarantee rather than staying in a safe range.

@andygrove andygrove left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on this @schenksj

Addresses review feedback on apache#4588: note that the col + lit leaves take the
non-rebalanced serde but stay small so nothing throws -- the bitwise chain is
what's under test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@schenksj

schenksj commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the careful read, @andygrove — all three addressed:

1. ANSI / float / decimal deep chains. Added a "Not addressed (by design)" section to the description. On the fallback question, to be precise rather than reassuring: these don't degrade gracefully — they retain the exact pre-PR behavior. Serde still succeeds (it just emits the left-deep proto), so there's no plan-time fallback trigger, and Comet doesn't raise prost's default recursion limit on the native side (nor is there a runtime-fallback path), so a sufficiently deep such chain still errors on re-parse (JVM parseFrom / native prost decode) exactly as before #4577. So this PR neither fixes nor regresses them — it's strictly additive for the safely-associative cases. If we wanted those to fall back cleanly instead of erroring, that'd be a separate change (e.g. a depth guard that declines to native), which I'm happy to file as a follow-up if you think it's worth it.

2. Deep-bitwise test leaves. Added a comment noting the col("_1") + lit(i) leaves are intentionally cheap Adds — they take the non-rebalanced serde but stay small so nothing overflows/throws; the deep bitwise chain is what's under test. (commit pushed)

3. Shared build helper. Left as-is per your note that the current form reads fine and matches the existing style — the two builders differ only in the inner-node construction (MathExpr with eval_mode/return_type vs plain BinaryExpr), and keeping each self-contained avoids threading a node-builder callback through a shared recursion. Happy to factor it out if you'd prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rebalance deep associative binary expression chains (Add, Multiply, bitwise) to avoid protobuf recursion limit

2 participants